Evaluation of Linguistic Features for Word Sense Disambiguation with Self-Organized Document Maps
نویسنده
چکیده
Word sense disambiguation automatically determines the appropriate senses of a word in context. We have previously shown that self-organized document maps have properties similar to a large-scale semantic structure that is useful for word sense disambiguation. This work evaluates the impact of different linguistic features on self-organized document maps for word sense disambiguation. The features evaluated are various qualitative features, e.g. part-of-speech and syntactic labels, and quantitative features, e.g. cut-off levels for word frequency. It is shown that linguistic features help make contextual information explicit. If the training corpus is large even contextually weak features, such as base forms, will act in concert to produce sense distinctions in a statistically significant way. However, the most important features are syntactic dependency relations and base forms annotated with part of speech or syntactic labels. We achieve 62.9 %±0.73 % correct results on the fine grained lexical task of the English SENSEVAL-2 data. On the 96.7 % of the test cases which need no back-off to the most frequent sense we achieve 65.7 % correct results.
منابع مشابه
رفع ابهام معنایی واژگان مبهم فارسی با مدل موضوعی LDA
Word sense disambiguation is the task of identifying the correct sense for the word in a given context among a finite set of possible sense. In this paper a model for farsi word sense disambiguation is presented. The model use two group of features: first, all word and stop words around target word and topic models as second features. We extract topics from a farsi corpus with Latent Dirichlet ...
متن کاملImproving Wikipedia Miner Word Sense Disambiguation Algorithm
This document describes the improvements of the Wikipedia Miner word sense disambiguation algorithm. The original algorithm performs very well in detecting key terms in documents and disambiguating them against Wikipedia articles. By replacing the original Normalized Google Distance inspired measure with Jaccard coefficient inspired measure and taking into account additional features, the disam...
متن کاملTowards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features
This paper shows that our WSD system using rich linguistic features achieved high accuracy in the classification of English SENSEVAL2 verbs for both fine-grained (64.6%) and coarse-grained (73.7%) senses. We describe three specific enhancements to our treatment of rich linguistic features and present their separate and combined contributions to our system’s performance. Further experiments show...
متن کاملAn evaluation exercise for Romanian Word Sense Disambiguation
This paper presents the task definition, resources, participating systems, and comparative results for a Romanian Word Sense Disambiguation task, which was organized as part of the SENSEVAL-3 evaluation exercise. Five teams with a total of seven systems were drawn to this task.
متن کاملLinguistic Resources and Evaluation Techniques for Evaluation of Cross-Document Automatic Content Extraction
The NIST Automatic Content Extraction (ACE) Evaluation expands its focus in 2008 to encompass the challenge of cross-document and cross-language global integration and reconciliation of information. While past ACE evaluations were limited to local (within-document) detection and disambiguation of entities, relations and events, the current evaluation adds global (cross-document and cross-langua...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computers and the Humanities
دوره 38 شماره
صفحات -
تاریخ انتشار 2004